UFRGS&LIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging
نویسندگان
چکیده
This paper presents our approach towards the SemEval-2016 Task 10 – Detecting Minimal Semantic Units and their Meanings. Systems are expected to provide a representation of lexical semantics by (1) segmenting tokens into words and multiword units and (2) providing a supersense tag for segments that function as nouns or verbs. Our pipeline rule-based system uses no external resources and was implemented using the mwetoolkit. First, we extract and filter known MWEs from the training corpus. Second, we group input tokens of the test corpus based on this lexicon, with special treatment for non-contiguous expressions. Third, we use an MWE-aware predominantsense heuristic for supersense tagging. We obtain an F-score of 51.48% for MWE identification and 49.98% for supersense tagging.
منابع مشابه
ICL-HD at SemEval-2016 Task 10: Improving the Detection of Minimal Semantic Units and their Meanings with an Ontology and Word Embeddings
This paper presents our system submitted for SemEval 2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM; Schneider, Hovy, et al., 2016). We extend AMALGrAM (Schneider and Smith, 2015) by tapping two additional information sources. The first information source uses a semantic knowledge base (YAGO3; Suchanek et al., 2007) to improve supersense tagging (SST) for named entiti...
متن کاملUTU at SemEval-2016 Task 10: Binary Classification for Expression Detection (BCED)
The SemEval 2016 DiMSUM Shared Task concerns the detection of minimal semantic units from text and prediction of their coarse lexical categories known as supersenses. Our approach is to define this task as a binary classification problem approachable by straightforward machine learning methods. We start by detecting semantic units by matching text spans against several large dictionaries, inclu...
متن کاملINF-UFRGS-OPINION-MINING at SemEval-2016 Task 6: Automatic Generation of a Training Corpus for Unsupervised Identification of Stance in Tweets
This paper describe a weakly supervised solution for detecting stance in tweets, submitted to the SemEval 2016 Stance Task. Our approach is based on the premise that stance can be exposed as positive or negative opinions, although not necessarily about the stance target itself. Our system receives as input ngrams representing opinion targets and common terms used to denote stance (e.g. hashtags...
متن کاملWHUNlp at SemEval-2016 Task DiMSUM: A Pilot Study in Detecting Minimal Semantic Units and their Meanings using Supervised Models
This paper describes our approach towards the SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM). We consider that the two problems are similar to multiword expression detection and supersense tagging, respectively. The former problem is formalized as a sequence labeling problem solved by first-order CRFs, and the latter one is formalized as a classification prob...
متن کاملUW-CSE at SemEval-2016 Task 10: Detecting Multiword Expressions and Supersenses using Double-Chained Conditional Random Fields
We describe our entry to SemEval 2016 Task 10: Detecting Minimal Semantic Units and their Meanings. Our approach uses a discriminative first-order sequence model similar to Schneider and Smith (2015). The chief novelty in our approach is a factorization of the labels into multiword expression and supersense labels, and restricting first-order dependencies within these two parts. Our submitted m...
متن کامل